Deep Comprehension, Generation And Translation Of Weather Forecasts (Weathra)
نویسندگان
چکیده
to be a domain where automat ic t ranslat ion was poss ib le (Kit t redge, 1973). Everybody in the field knows that there is a computer in Montreal t rans la t ing forecas ts rou t ine ly between French and English (METEO). The weather domain has proven to be a f rui t ful domain for fur ther research as wi tnessed e.g. by the s y s t e m for g e n e r a t i n g m a r i n e forecasts presented by Kit t redge et al (1986), by the work by Goldberg et al (1988), by the system genera t ing pub l i c wea ther repor t s in Bulgar ian repor ted on by Mitkov (1991) and the sys tem transla t ing F i n n i s h m a r i n e f o r e c a s t s in to Swedish by Bl~tberg (1988). The Swedish Weathra system to be p re sen ted in this pape r explores the language and semantics of weather forecasts further and it a ims at deep c o m p r e h e n s i o n of forecas t s . Bes ide g rammat i ca l representat ions, Weathra uses representations of the meteorological raw facts and secondary facts, e.g. the fact that it will probably rain at a place where there is a low pressure area. It uses a represen ta t ion of m e t e o r o l o g i c a l ob jec t s with their propert ies as frames in a data base and graphic representat ion with tile standard meteorological icons on a map, e.g. icons for sun, cloudy, rain, s n o w , t h u n d e r s t o r m , w e s t e r l y winds, L(ow) and H(igh) pressure, t empera tures , e.g. 10-15. Weathra also features a dynamic d iscourse r e p r e s e n t a t i o n i n c l u d i n g the d i scour se ob jec t s which may be r e fe r r ed to by the words and anaphora in the text (cf Karttunen, 1976, Johnson & Kay, 1990). The d i scourse objec ts are regarded as ins tances of the (p ro to ) t ype s or (concepts), which are also available as frames in a database. The formal grammar, morphology and lexicon of Weathra are based on e x p e r i e n c e f rom the machine t rans la t ion sys tem Swetra (Sigurd & Gawronska, 1988), which is also wri t ten in Pro log (LPA MacProlog) . The Weathra system can understand weather forecasts in a f a i r ly deep sense , dep i c t its comprehens ion in a map, answer quest ions about the main contents and consequences , translate English forecasts into Swedish ones and vice versa, and generate various forecast texts in English or Swedish. AcrEs DE COLING-92, NANTES, 23-28 AOUI" 1992 7 4 9 PROC. OF COLING-92. NANTES. AUG. 23-28, 1992 The language of forecasts Even a quick glance at the weather forecasts in newspapers shows that they are written in a special format beside using a restricted vocabulary (the METEO system uses some 1000 words, and so does Weathra). There are in fact two basic styles in the forecasts: the telegraphic style illustrated by Sun; Cloudy; Windy; Cool; Morning fog; Thunderstorms in the coastal areas; High 20, Low 15; Westerly winds; Snow over Alps; Visibility moderate; Cloudy, little rain at first, brighter later and a normal descriptive style illustrated by A low pressure area is moving towards Scandinavia. It is expected to reach Norway in the afternoon. There is, in fact, also an informal personalized style which may be illustrated by the following quotes from a British newspaper (the European): Players in the Rugby League test match between Great Britain and Australia on Saturday may need longer studs and safe hands to tackle the tricky conditions at L o n d o n ' s W e m b l e y s tad ium. Sunseekers looking to top up their tan need look no further than southern Spain. The Weathra system is primarily designed to treat the telegraphic and the descriptive styles. The grammar of forecasts Weathra includes two grammars mot ivated by the dis t inct ion between t e l eg raph ic weather phrases and full sentences . Interestingly enough the grammatical categories of the two kinds of expressions differ. The telegraphic phrase grammar can work with a superord ina te ca tegory cal led nominal which includes both nouns and adjectives. The noun phrases used in telegraphic grammar may be somewhat different. They may, for instance, lack articles as evidenced by some of the examples mentioned above. The adjectives of the nominal category have a special marker (-t) in Swedish, cf English sun in coastal areas: sunny in coastal areas (Swedish: Sol i kustomrddet: Soligt i kustomrddet. The telegraphic meteorological phrases lack finite verbs, but there is often a parallel full sentence with a future verb will be (available in the full sentence module). l.a. Sunny in Wales 1.b. The weather will be sunny in... 2.a. Sun in the morning 2.b. There will be sun in... 3.a. High 20, Low 15 3.b. The temperature will be between 15 and 20 4.a. Visibility moderate 4.b. The visibility will be moderate 4.a. Probably rain 4.b. It will probably rain ACRES DE COLING-92. NANTES, 23-28 ^Ot]T 1992 7 S 0 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 The fol lowing is the basic rule showing how E n g l i s h wea the r utterances (ewutt) can be generated as phrases (ewph) or full sentences (ewsent). ewutt(T,F,S,[]) :ewph(T,F,S,[]). ewutt(T,F,S,[]) :ewsent(T,F,S,[]). The basic rule for phrases is: ewph(A, [event (N) , t ime( fu t ) , adv l (A) , advl(B),co(C)]) --> eadv(A) ,enom(N) , eadv(B),eco(C). This rule generates e.g. In the morning mild; Rain in the evening; In Scotland gale in the afternoon; Rain and snow, i.e. nominal phrases with adverbial determiners before and/or after a nominal which can then be an adjective or a noun, single or coordinated. As can be seen the rules render a l ist r ep resen ta t ion cal led f u n c t i o n a l event representat ion, where the event is the first term, then actors, t ime and adverb ia l s . There is n o r m a l l y no more than two adverbial phrases before or after the nominal. The last term c o ( C ) takes care of cases of coordinated phrases or sentences. The first slot is used to indicate which constituent is in focus (first), information which is useful in the text generat ion process. The further d iv is ion of the s u p e r o r d i n a t e c a t e g o r y n o m i n a l ( e n o m ) is i l lus t ra ted by the following rules for English: enom(M) --> enp(Agr,M). enom(M) --> cap(M). enp(Agr,n om(H,Adj,Attr)) --> eap(Adj) ,en(Agr ,H) ,epat t r (Agr ,At t r ) . Note that noun phrases have to carry agreement information (Agr) into the post attributive expression ( e p a t t r ) as it might be a relative clause where tile inflection of the verb depends on the features of the head of tile np, as in: light winds which turn west/ light wind which lurns west. The following is one of the DCG rules generating and analyzing full sen tences : ewsent(N,I event(V),actor(N), t ime(T) ,advl (A 1), co(C)]) --> enp(Agr,N), evi(Agr,m(V,T)) , eadv(A 1 ),esco(C). It can for ins tance generate the sen tence : A low pressure area approaches Scandinavia. A n o t h e r pattern is used in order to generate sentences such as The low pressure area will bring rain in Sweden, etc. There are about a dozen different syntactic structures to be found in the forecasts. The lexicon has the same format as Swetra. The first slot contains the form of the item (one or several words), the second the meaning written in "machinese", the ACRES DE COLING-92, NANTES, 23-28 AOt~'r 1992 7 5 1 PROC. OF COLING-92, NANHES, AU~. 23-28, 1992 third slot the grammatical category and further slots may be used for various features and class i f icat ions. The following are some examples. slex([in],m(in],prep . . . . . . . . . . . . . loc). s l ex ( l "Skand inav i en" ] ,m( scand inav i a , prop),n . . . . . . . . . . . . . loc). s l ex ( [p~ t , e f t e rmiddagen] ,m( in_ the_a f ternoon), adv . . . . . . . . . . . t ime,dur). The lexicon includes a great number of m u l t i w o r d i t e m s of the kind i l lus t ra ted by pd e f termiddagen. These f ixed phrases are par t iculary common in special domains such as fo recas t s . T h e w o r d s , c o n c e p t s and objects of forecasts Cons ide r the fo l lowing text, with s e v e r a l a l t e r n a t i v e s e c o n d sen tences . A gale is moving towards Scandinavia. This sentence may be rendered in the following way in order to reveal the concepts and objects involved, some of which can also be referred to. The potential referential objects are numbered (within parentheses) . Something (01), which is an instance of the concept 'gale' (02) and is of the concept 'towards' (08) and is denoted by the English word towards (09). The goal (010) of the movement has the proper name Scandinavia (011) in English. The fo l lowing are some poss ib le success ive sen tences where the objects referred to are marked as O1,O2 etc. It (01) moves fast (1) It (04) happens fast (2) It (01) is better called a cyclone. (3) It (02) translates as "storm" in Swedish (4) It (09) is better spelled toward (5) It (010) includes Sweden (6) We take a reference to prove that the object is a poss ib le d iscourse objec t (discourse referent to use Kar t tunen 's term, 1976). We may c o n s i d e r d i s c o u r s e o b j e c t s as individual temporary mental objects created p r imar i ly for the sake of c o m m u n i c a t i o n . T h e y can be d e n o t e d by a word and are classif ied as instances of prototypes (concepts) , when they are denoted by generic words. The classif icat ion is done acco rd ing to a set of p e r m a n e n t p r o t o t y p e s ( c o n c e p t s ) . What the speaker does is typica l ly to c r e a t e t e m p o r a r y d i s c o u r s e denoted by the English word g a l e objects, define them as instances of (03) does something (04) which is certain types (unless proper names an instance of the concept of 'move' can be used, hopefully known to the (05) and is denoted by the English l i s tener) and say something about word move (06). The movement has them. a direction (07) which is an instance ACRES DE COLING-92, NANT~, 23-28 AO~r 1992 7 5 2 PROC. OF COLING-92, NANTES, AUG. 23-28. 1992 The first object created in our sample text is (O1) and it is said to be an instance of the type denoted by 'gale' in English. It is said to do something which is an instance of ' m o v e m e n t ' , etc. The second sentence may refer to d i f ferent objects in t roduced in the f irst sentence, even to objects denoted by verbs or sentences (It moves fast). If we are to elaborate this s en t ence we would say T h e movement (of the gale towards Scandinavia) happens fast, but one may also use a hyperonym as in The event happens fast and possibly The accident happens (or develops) fast. We may say that the same object is being referred to in the sentence: I t is a disaster, but in this case one may assume the existence of a type of object which has been called " i n c l u s i v e r e fe ren t " by Bonny W e b b e r . Note the distinction between a temporary object created in order to th ink or say someth ing and a permanent object such as 'gale ' . Even a concept can be referred to, as i l l u s t r a t e d by the second successive alternative (It translates as "storm" in Swedish), where i t must refer to the object 02, which is a concept, not the object 01. We may also note that a word used may be re fe r red to, as i l lustrated by It should better be spelled "forward", where it certainly refers to the word f o r w a r d s . If it were to refer to the concept one would not use the word spe l led but rather denoted. This survey is in tended to clarify that it is generally necessary to keep track of the following types of objects and representations: 1) Meteorological objects These include both objects proper, such as low pressure areas and other air masses , and episodes (s ta tes , even t s , and processes ) describing phenomena such as rain, change of temperature, as well as locations and time intervals. 2) Discourse objects Discourse objects can be described by meteorological objects, but they also have linguistic expressions. Not all meteorologica l objects whose existence is implied by a forecast describe discourse objects. 3) Grammatical representat ions Grammat ica l r epresen ta t ions refer to expressions s ignifying discourse objects The main levels of representation in Weathra are: Level of meteorological objects air mass: gale doing:move speed: fast direction: Scandinavia Level of discourse objects O l : g a l e O2:move O 3 : S c a n d i n a v i a O4:(=O1) O5:move O6:fast AcrEs DE COLING-92, Nhbrr~s, 23-28 ao~r 1992 7 5 3 I)ROC. OF COLING-92, NANTES, AUG. 23-28. 1992 Level of functional event-repr. [ even t (move ) , ac to r (ga l e ) , t i m e ( p r e s ) , a d v l ( t o w a r d , S c a n d i n a v i a ) ] , [ e v e n t ( m o v e ) , a c t o r ( g a l e ) , t i m e ( p r e s ) , adv l ( fas t ) ] Text level: A gale is moving towards Scandinavia. It moves fast. The p e r m a n e n t c o n c e p t s which const i tu te f rames with informat ion is only background objects al luded to by the words (cf Nirenburg & Defrise, 1991). The concept (frame) gale thus includes the informat ion that a 'gale' has speed and that this speed is between 20 and 30 meters per second, a direct ion (which all speeds have) , o f ten l e a d s to acc iden t s at sea and a long the coasts, etc. To an Engl ish-speaking person the concept is known to be denoted by the word g a l e , to a Swede by s t o r m , but that is not essential information in the concept 'ga le ' , The concept 'move ' is to i n c l u d e the i n f o r m a t i o n that movemen t impl i e s being at one p lace f i rs t and another later , a certain speed and direction. To those who are familiar with it the concept ' S c a n d i n a v i a ' i n c l u d e s t h e information that this is a place and an area , which covers Norway , Sweden, Denmark etc. Scandinavia is a proper name and not a generic noun and something cannot be said to be an instance of the concept 'Scandinavia ' . Concepts are stored as frames using the tool FLEX which is available with LPA MacProlog. Understanding and generating weather forecasts The program al lows a ( te legraphic or full) sentence to be parsed by the grammar and lexicon applying some imp l i ca t iona l morpho log i ca l procedures. This analysis renders a kind of f u n c t i o n a l r e p r e s e n t a t i o n as shown above. This representat ion is p a r s e d by m a p p i n g p r o c e d u r e s which look for dep ic t ab le objects and places. Words such as s u n , sunny , result in a sun in the proper place in a map, rain results in the p rope r symbol , wes te r l y winds results in an arrow with the proper direction. Note that several words may result in the same symbol on the map. Sunny, sun and fa ir will all be represented by the icon "sun". The func t iona l r epresen ta t ion is also scanned by the concep t f inder which looks for concep ts about which it has information. Thus the frame 'ga le ' is used as the pro to type of the ins tance O1 and 'move' is used as the prototype of 02. The meteorological f inder looks for data for its general frames. The system can be used for generation by placing a certain icon on the m a p and c a l l i n g for genera t ion . This wil l resul t in a sentence such as telegraphic: Sunny AcrEs DE COLING-92, NANTES, 23-28 nOra" 1992 7 5 4 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 in Southern Sweden , Sun in Southern Sweden, Fair in Southern S w e d e n , or descriptive It will be sunny.. , There will be sun .... The weather will be fair... The generation process may be set to generate telegraphic or full utterances, single or coordinated utterances, texts where the area is kept in focus, e.g. Wales will get sun and light winds or coordinations with different areas such as W a l e s will get sun, but Cornwall will get rain. The procedures may also generate texts where the focus is on the weather type as illustrated by There will be snow in Scotland and in the Midlands or There will be snow in Scotland, but rain in Wales. T r a n s l a t i o n The generation triggered by placing a meteorological icon on the map can be rendered in English or Swedish. Parsed and analyzed Swedish forecasts can be translated into English using the functional event r ep re sen ta t i on . The functional representations of English and Swedish are very similar and the few differences are handled by transfer rules. R e f e r e n c e s Bl/iberg, O. Translating Finnish weather forecasts into Swedish (Dept of Linguistics, Ume~i: 1988) Bourbeau, L, Carcagno, i), Goldberg, E, Kittredgc, R & Polgu6re, A. Bilingual generation of weather forecasts in an operations environment. Proc. Coling 9(1, Helsinki (1990) Goldberg, E., Kittredge, R & Polgu6re, A. Computer generation of marine weather forecast text. Journal of atmospheric and oceanic technology, vol 5, no 4, 472-483 Johnson, M & Kay, M. Semantic abstracting and anaphora. Coling 90. Helsinki (1990) Karttunen, L. Discourse referents. In: McCawley, J (ed) Syntax and semantics 7, New York (Academic press:1976), 363-385 Kittredge, R, et al (1973) 'TAUM-73'. Montreal: Universit6 de Montreal Kittredge, R., Polgu6re, A. & Goldberg, E. Synthesizing weather forecasts from formatted data. Proc. of Coling 86, Bonn (1986) Mitkov, R. Generat ing public weather reports. In: Yusoff, Z. Proceedings of the International conference on Curreilt issues in Computational IAuguistics, Penang Malaysia, 1991 Nirenburg, S. & Defrise, C. Aspects of text meaning. In: J. Pustejovsky (ed) Semantics and the lexicon. Dordrecht (1991 :Kliiwer) Sigurd, B. & Gawronska, B. The potential of SWETRA a muhilanguage MT-system. Computers and Translation 3, (1988), 238-250. Acrl~ DE COLING-92, NANIES, 23-28 AOt'rr 1992 7 5 5 Paoc. o1' COLING-92. NANfI!S, AUG. 23-28, 1992
منابع مشابه
Predicting Peak Sector Occupancy with Two-Hour Convective Weather Forecasts
An important function of traffic flow management ensuring the number of aircraft entering a sector does not exceed the amount that can be safely controlled by the sector controller. One factor that makes this task difficult is the uncertainty of the impact of convective weather, as both the weather forecast and the impact given specific weather is uncertain. In this investigation, we study this...
متن کاملEvaluating an NLG System using Post-Editing
Computer-generated texts, whether from Natural Language Generation (NLG) or Machine Translation (MT) systems, are often post-edited by humans before being released to users. The frequency and type of post-edits is a measure of how well the system works, and can be used for evaluation. We describe how we have used post-edit data to evaluate SUMTIME-MOUSAM, an NLG system that produces weather for...
متن کاملSublanguage Engineering In The Fog System
FoG currently produces bilingual marine and public weather forecasts at several Canadian weather offices. The system is engineered to reflect "good professional style" as found in human forecasts. However, some regularization and simplification of the output has been needed. Sublanguage engineering issues include tradeoffs in coverage and style, handling variation and evolution of sublanguages,...
متن کاملStatistical Machine Translation of Croatian Weather Forecasts: How Much Data Do We Need?
This research is the first step towards developing a system for translating Croatian weather forecasts into multiple languages. This step deals with the Croatian-English language pair. The parallel corpus consists of a one-year sample of the weather forecasts for the Adriatic, consisting of 7,893 sentence pairs. Evaluation is performed by the automatic evaluation measures BLUE, NIST and METEOR,...
متن کاملThe Relationship between Translation Tests and Reading Comprehension: A Case of Iranian University Students
The present study seeks to investigate the potentiality of the translation task as a testing method for measuring reading comprehension. To achieve this objective, two types of translation tests, open-ended and multiple-choice tests, and two types of reading comprehension tests, multiple-choice reading comprehension and open-ended cloze tests were developed in this study. The reliability of the...
متن کامل